-
Notifications
You must be signed in to change notification settings - Fork 75
Update PyTorch pin to the version which support elapsed_time; remove our patch for elapsed_time
#2952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…our patch for elapsed_time Signed-off-by: Anatoly Myachev <[email protected]>
|
@chengjunlu there are many messages like: |
|
@pbchekin do you know why? (from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12196498158) FAILED [4.7235s] inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/intel/oneapi/bin/icpx' |
…-triton into amyachev/issue2945
Signed-off-by: Anatoly Myachev <[email protected]>
There is no such file, indeed. This file is located at:
|
Signed-off-by: Anatoly Myachev <[email protected]>
I found the reason - it's because of how pytorch searches for sycl home and it probably relates to pytorch/pytorch@4742080. Ref to PyTorch: https://github.com/pytorch/pytorch/blame/5872a8c6b00a5c9e45ac4bc99a5c87b93a93aa94/torch/utils/cpp_extension.py#L147 def _find_sycl_home() -> Optional[str]:
"""Find the OneAPI install path."""
# Guess #1
sycl_home = os.environ.get('ONEAPI_ROOT')
if sycl_home is None:
# Guess #2
icpx_path = shutil.which('icpx')
if icpx_path is not None:
sycl_home = os.path.dirname(os.path.dirname(
os.path.realpath(icpx_path)))
if sycl_home and not torch.xpu.is_available():
print(f"No XPU runtime is found, using ONEAPI_ROOT='{sycl_home}'",
file=sys.stderr)
return sycl_home |
|
@pbchekin any chance that My guess comes from pytorch/pytorch#142242 (comment). |
Nope: |
Signed-off-by: Anatoly Myachev <[email protected]>
|
Inductor CI with changes from PR 2962: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12221768210 |
This warning information is a DPC++ feature. Let me check with torch team how to disable it. |
Signed-off-by: Anatoly Myachev <[email protected]>
This reverts commit 6e3c80b.
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
This reverts commit c13270d.
|
Hi @guangyey! After this change pytorch/pytorch#135567 our tutorials started to fail with A quick search through the PyTorch codebase gave me the idea that the problem in Example: # Each pointer is obtained through `tensor.data_ptr()` function.
d_a_ptrs = torch.tensor([18374686479673720832, 18374967954644140032, 18374967954645188608, 18374967954645450752], device=device) # <- failedStack trace: #0 0x00007fffed0824a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007fffec20b246 in THPUtils_unpackLong(_object*) () from .../intel-xpu-backend-for-triton/.scripts_cache/pytorch/torch/lib/libtorch_python.so
#2 0x00007fffec9b7911 in torch::utils::store_scalar(void*, c10::ScalarType, _object*) ()
from .../intel-xpu-backend-for-triton/.scripts_cache/pytorch/torch/lib/libtorch_python.so
#3 0x00007fffec9c1af8 in torch::utils::(anonymous namespace)::recursive_store(char*, c10::ArrayRef<long>, c10::ArrayRef<long>, long, c10::ScalarType, unsigned long, _object*) [clone .isra.0] () from .../intel-xpu-backend-for-triton/.scripts_cache/pytorch/torch/lib/libtorch_python.so
#4 0x00007fffec9c3460 in torch::utils::(anonymous namespace)::internal_new_from_data(c10::TensorOptions, c10::ScalarType, std::optional<c10::Device>, _object*, bool, bool, bool, bool) () from .../intel-xpu-backend-for-triton/.scripts_cache/pytorch/torch/lib/libtorch_python.so
#5 0x00007fffec9c8d77 in torch::utils::tensor_ctor(c10::DispatchKey, c10::ScalarType, torch::PythonArgs&) ()
from .../intel-xpu-backend-for-triton/.scripts_cache/pytorch/torch/lib/libtorch_python.so
#6 0x00007fffec4ed8f2 in torch::autograd::THPVariable_tensor(_object*, _object*, _object*) ()
from .../intel-xpu-backend-for-triton/.scripts_cache/pytorch/torch/lib/libtorch_python.so
#7 0x00005555556985a6 in cfunction_call (func=0x7ffff7631800, args=<optimized out>, kwargs=<optimized out>)
Simplified example: import torch
test = torch.rand((10, 10), device="xpu", dtype=torch.float16)
test_ptr = test.data_ptr()
torch.tensor(test_ptr, device="xpu") # <- RuntimeError: Overflow when unpacking long |
Signed-off-by: Anatoly Myachev <[email protected]>
|
I decided to try specifying the dtype directly, as suggested in pytorch/pytorch#135628. @guangyey do I understand correctly that it is now recommended to use this approach in the code? |
In my understanding, specifying the dtype is the correct approach. This is because we change >>> b = torch.tensor([1,2])
>>> b.dtype
torch.int64 |
guangyey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
PyTorch commit:
RuntimeError: Overflow when unpacking long)Inductor CI:
According to https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12242518858/job/34150030016 it's fine to go.